
ELSEVIER 


Available online at www.sciencedirect.com - 

Protein 


Protein Expression and Purification 52 (2007) 249-257 

www.elsevier.com/locate/yprep 


^Expression 

^Purification 


*•# ScienceDirect 


Expression, purification and characterization of recombinant severe 
acute respiratory syndrome coronavirus non-structural protein 1 

Kimberly Brucz a , Zachary J. Miknis b , L. Wayne Schultz a,b , Timothy C. Umland a,b ’* 

a Hauptman- Woodward Medical Research Institute, 700 Ellicott Street, Buffalo, NY 14203, USA 
b Department of Structural Biology, SUNY at Buffalo, Buffalo, NY 14203, USA 

Received 4 August 2006, and in revised form 6 November 2006 
Available online 14 November 2006 


Abstract 

The coronavirus (CoV) responsible for severe acute respiratory syndrome (SARS), SARS-CoV, encodes two large polyproteins (ppla 
and pplab) that are processed by two viral proteases to yield mature non-structural proteins (nsps). Many of these nsps have essential 
roles in viral replication, but several have no assigned function and possess amino acid sequences that are unique to the CoV family. One 
such protein is SARS-CoV nspl, which is processed from the N-terminus of both ppla and pplab. The mature SARS-CoV protein is 
present in cells several hours post-infection and co-localizes to the viral replication complex, but its function in the viral life cycle remains 
unknown. Furthermore, nspl sequences are highly divergent across the CoV family, and it has been suggested that this is due to nspl pos¬ 
sessing a function specific to viral interactions with its host cell or acting as a host specific virulence factor. In order to initiate structural 
and biophysical studies of SARS-CoV nspl, a recombinant expression system and a purification protocol have been developed, yielding 
milligram quantities of highly purified SARS-CoV nspl. The purified protein was characterized using circular dichroism, size exclusion 
chromatography, and multi-angle light scattering. 

© 2006 Elsevier Inc. All rights reserved. 
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The severe acute respiratory syndrome (SARS) * 1 out¬ 
break of 2002-2003, followed by a much smaller outbreak 
in 2004, caused over 8000 illnesses and nearly 800 deaths 
(World Health Organization; http://www.who.int/csr/sars/ 
country/table2004_04_21/en/index.html). The infectious 
agent responsible for this disease was quickly identified as a 
new member of the coronavirus (CoV) family, SARS-coro- 
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navirus (SARS-CoV) [1-3], most closely related to the 
group 2 Co Vs [4]. This newly emerged virus prompted a 
renewed interest in CoV research. Prior to the SARS out¬ 
break, only two CoVs (HCoV-229E and HCoV-OC43) 
were known to infect humans [5]. These two Co Vs have 
been estimated to cause up to 30% of common colds and 
mild respiratory illnesses [6]. Other Co Vs are widespread in 
both domestic and wild animals, with several posing signifi¬ 
cant economic impact on livestock and poultry industries. 

Following the emergence of SARS, two additional 
human Co Vs associated with upper and lower respiratory 
tract diseases were identified. Three groups independently 
identified in young children what is likely a single CoV spe¬ 
cies, and this new CoV has been variously designated 
NL63, NL, and HCoV-NH [7-9]. The second new CoV was 
discovered in an elderly patient suffering from pneumonia 
in Hong Kong and has been designated HKU1 [10]. Both 
of the newly identified human Co Vs appear to be 
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widespread, especially in children, and have likely been 
present in a human host reservoir for an extended time. 

SARS-CoV has not re-emerged since 2004, but the natu¬ 
ral reservoir of the virus has been putatively identified in 
several related species of Chinese horseshoe bats [11,12]. 
These bats are sold in Chinese live animal markets and used 
in traditional Chinese medicine, and thus re-emergence of 
SARS is a distinct possibility. Most members of the CoV 
family exist in a narrow range of host species, specific for 
each virus. SARS-CoV and HCoV-OC43 are notable exam¬ 
ples of Co Vs having documented host range expansions 
from their original animal reservoir (bats [11,12] and 
bovines [13], respectively), acquiring the ability to infect 
and be transmitted between humans. 

The CoV family possesses the largest RNA genome (28- 
30 kb) of all RNA viruses. Their positive strand RNA 
genome exhibits a common organization throughout the 
family. Two overlapping open reading frames (ORFla and 
ORFlab) are present at the genome’s 5' end, which encode 
two large polyproteins (ppla and pplab) [14]. Viral prote¬ 
ases process ppla and pplab to yield the mature viral non- 
structural proteins (nsps, nspl-nspl6 in SARS-CoV) 
[15,16]. Many of these nsps have been associated with viral 
replication [17], and so are also referred to as replicase pro¬ 
teins. Several of these nsps are unique to the CoV family or 
to individual Co Vs. For example, SARS-CoV nspl exhibits 
weak sequence similarity only to the nspls of several other 
group 2 CoVs [4]. A lack of functional data exists for nspl 
of all Co Vs, due at least in part to the absence of homology 
to a well-characterized protein family and its high variabil¬ 
ity across the CoV family. 

SARS-CoV nsps 1, 2, and 3 are processed by a papain¬ 
like protease (PL2 pro ) contained within nsp3 [15,16]. Both 
SARS-CoV ppla and pplab contain nspl at their respec¬ 
tive N-termini. These two studies demonstrated that viral 
PL2 pro cleaves both polyproteins at 180 G|A 181 , producing 
the mature SARS-CoV nspl containing 180 amino acid res¬ 
idues with a calculated mass of 19.6kDa. Furthermore, the 
mature SARS-CoV nspl was shown to be present in Vero 
cells at 4-6h post-infection. SARS-CoV nspl was not 
detected as a component of a larger partially processed 
polyprotein intermediate within lysates from the infected 
cells, indicating that proteolytic processing at the nspl- 
nsp2 cleavage site occurs rapidly following synthesis of 
ppla and pplab. Immunofluorescence experiments 
revealed that SARS-CoV nspl co-localized with other rep¬ 
licase proteins into discrete cytoplasmic foci that were both 
perinuclear and dispersed throughout the cytoplasm 
[15,16]. These cytoplasmic foci likely represent SARS-CoV 
replication complexes, where viral RNA synthesis occurs. 
These replication complexes form on double-membrane 
vesicles, with the vesicles likely constructed through viral 
manipulation of the cellular autophagy system [18,19]. 
Later in infection, a more diffuse distribution of SARS- 
CoV nspl was observed, possibly indicative of a change in 
localization during the viral life cycle, or degradation of 
previously formed foci. Virus release from infected Vero 


cells occurred 3 to 6h after the initial observation of the 
presence of nspl [16]. 

SARS-CoV nspl possesses weak sequence homology 
with mouse hepatitis virus (MHV) nspl [4], although the 
mature MHV protein is 8 kDa larger. While comparisons of 
data regarding nspl from MHV and SARS-CoV must be 
conducted with caution due to significant sequence differ¬ 
ences, results from MHV nspl studies suggest an essential 
role for SARS-CoV nspl. The N-terminal half of MHV 
nspl has been shown to be essential to produce an infec¬ 
tious virus, and point mutations within this region pro¬ 
duced virus with altered replication and RNA synthesis 
[20]. The MHV nspl C-terminal half can be deleted or the 
nspl-nsp2 cleavage site eliminated, and both mutations 
will yield viable virus but with delayed replication and low¬ 
ered peak titers [20,21]. 

The cellular co-localization of SARS-CoV nspl with 
other viral nsps known to be essential for viral RNA syn¬ 
thesis and viral replication indicates that nspl may also 
have a role in these steps of the viral life cycle. The high 
sequence variability of nspl across the CoV family com¬ 
bined with the tendency for individual members of this 
family to possess a narrow host species range suggests that 
nspl may have specific host interactions, including suppres¬ 
sion of host gene expression [22]. SARS-CoV is known to 
have expanded its host range from its natural reservoir 
(bats) to other animals present in live animal markets (e.g., 
palm civets, raccoon dogs) and to humans. Hence, it may be 
possible that mutations in nspl were involved in the evolu¬ 
tion of the SARS-CoV host range. In order to further inves¬ 
tigate these possible functions of SARS-CoV nspl, we have 
undertaken the expression of recombinant protein in Esch¬ 
erichia coli , the purification to homogeneity, and the char¬ 
acterization of this protein. In particular, structural studies 
require large quantities of highly purified and monodis- 
perse protein samples, and the expression and purification 
experiments described here were conducted with those 
goals in mind. 

Materials and methods 

Cloning of SARS-Co V nspl 

The template for subcloning SARS-CoV nspl into an 
expression vector was a cDNA fragment that encoded nspl 
and a portion of nsp2. This cDNA fragment was generated 
by reverse transcriptase-polymerase chain reaction (RT- 
PCR) from SARS-CoV Urbani strain genomic RNA, and 
was a kind gift from Dr. Mark Denison (Vanderbilt Uni¬ 
versity). The cDNA encoding only nspl, corresponding to 
bases 265-804 of the SARS-CoV Urbani strain genome 
(GenBank Accession No. AY278741), was amplified by 
polymerase chain reaction (PCR) using the forward primer 
5'-ATG GAG AGC CTT GTT CTT GGT G-3', the 
reverse primer 5'-TTA ACC TCC ATT GAG CTC ACG 
AG-3' and Taq PCR Master Mix (Qiagen). The reverse 
primer was designed to introduce a STOP codon (TAA) at 
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the 3' end of the sense strand of the per product. The PCR 
was conducted in a standard manner, employing 30 cycles 
and an annealing temperature of 60 °C with an Applied 
Biosystems Gene Amp PCR System 2400. The PCR ampli¬ 
fied fragment was inserted into a modified pET15b expres¬ 
sion vector diagramed in Fig. la (Yanzhou Wang, 
unpublished results). The stock pET15b vector (Novagen) 
encodes a N-terminal (His) 6 fusion tag followed by a 
thrombin cleavage site and a multiple cloning site (MCS) 
encoding several unique restriction enzyme cleavage sites. 
The modified vector (Topo-HisGST-YZW) replaced the 
stock fusion tag, thrombin cleavage site and MCS with a N- 
terminal (His) 6 -glutathione-*S-transferase (GST) fusion tag 
followed by a tobacco etch virus (TEV) protease cleavage 
site. This vector was then linearized at the unique Xhol site, 
and adaptors added to both ends to provide vaccinia topoi- 
somerase recognition sequences. 

The PCR amplified product, with 3' adenine overhangs 
due to Taq polymerase activity, was incubated with the lin¬ 
ear Topo-HisGST-YZW plus topoisomerase at 22 °C for 
15 min. DH5oe library efficient competent cells (50 pi, Invit- 
rogen) were transformed via heat shock with 3 pi of the 
ligation reaction, and then plated onto LB agar plates con- 
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Fig. 1. Expression vector construction, (a) Schematic of the Topo-His¬ 
GST-YZW vector, (b) Schematic of the pHisGST-TEV-Snspl expression 
vector coding for SARS-CoV nspl with an N-terminal polyhistidine-GST 
dual affinity tag and a TEV protease cleavage site. 


taining lOOpg/mL ampicillin and incubated overnight at 
37 °C. Nascent colonies were preliminary screened by PCR 
with SARS-CoY nspl forward primer and T7 terminator 
primer (Novagen), and the PCR products were analyzed by 
agarose gel electrophoresis to identify transformants pos¬ 
sessing an expression vector with a DNA insert of correct 
size and directionality. Plasmids from colonies identified by 
this initial screen were purified using the QIAprep Spin 
Miniprep kit (Qiagen). Purified plasmids (pHisGST-TEV- 
Snspl) were sequenced using T7 promoter and terminator 
primers on an ABI PRISM 3130XL Genetic Analyzer to 
verify incorporation of cDNA encoding full-length SARS- 
CoV nspl into the vector. 

SARS-CoV nspl expression 

Several E. coli host strains [BL21(DE3), HMS174(DE3), 
Rosetta(DE3) (Novagen) and BL21 Star(DE3) (Invitro- 
gen)] were transformed with the verified pHisGST-TEV- 
Snspl expression vector for expression and solubility 
assays. Small scale expression cultures were grown from 
these transformed cells, testing media (Terrific Broth, TB; 
Luria Broth, LB), temperature (37 or 22 °C), and time post¬ 
induction. All cultures included ampicillin (50 pg/mL), and 
were induced using ImM isopropyl-p-D-thiogalactopyra- 
noside (IPTG, Inalco) at an OD 600 of 0.5-1.0. The 
Rosetta(DE3) cultures also included chloramphenicol 
(34 pg/mL). An aliquot of each culture was lysed and then 
fractionated into supernatant and insoluble pellet. Both 
fractions from each culture were analyzed by SDS-PAGE. 

Large scale expression occurred using transformed 
Rosetta(DE3) cells grown in TB, in the presence of chlor¬ 
amphenicol (34 pg/mL) and ampicillin (50 pg/mL). One liter 
cultures were grown to an OD 600 of ~0.65 and then were 
induced with 1 mM IPTG. A constant temperature of 37 °C 
at 260 rpm in a New Brunswick Scientific I250KC incu¬ 
bated shaker was maintained during growth and induction. 
Cells were harvested 3 h post-induction by centrifugation at 
6000g (Beckman Coulter Avanti Centifuge, J-20 XPI). The 
supernatant was decanted and cell pellets scraped into a 
sterile 50 mL falcon tube for immediate storage at — 80 °C. 

Recombinant SARS-CoV nspl purification 

Sample preparation for recombinant SARS-CoV nspl 
isolation began by thawing frozen cell pellets, harvested 
from 6 x 1 L cultures, in a 22 °C water bath. The thawed 
pellets were resuspended in a total of 80 mL of lysis buffer 
(50mM Hepes, pH 7.5, 250mM NaCl, ImM (3-mercap- 
toethanol and lOmM imidazole), supplemented with 3mL 
Protease Inhibitor Cocktail (Sigma, P2714) prepared 
according to the manufacturer’s protocol. Cells were lysed 
using a single 15,000-18,000 psi pass through a Microflui- 
dizer Processor M-110EH (Microfluidics), and the lysate 
was fractionated by high-speed centrifugation at 97,272g 
(Beckman L-60 Ultracentrifuge; 45Ti rotor; 30,000 RPM). 
The soluble fraction containing SARS-CoV nspl was 
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identified by SDS-PAGE and protein immunoblot (West¬ 
ern blot) employing an anti-polyhistidine antibody and the 
Western Breeze Chromogenic Kit (Invitrogen). 

The lysate supernatant was filtered through a 0.45 pM 
membrane, and then the clarified sample was loaded onto a 
5mL HisTrap immobilized metal affinity chromatography 
(IMAC) column (Amersham) equilibrated with binding 
buffer (50 mM Hepes, pH 7.5, 250 mM NaCl, ImM (3- 
mercaptoethanol and 15mM imidazole). Chromatography 
steps were performed on an AKTA FPLC system (Amer¬ 
sham) unless otherwise noted. Following sample load, the 
column was subjected to two wash steps to remove weakly 
bound contaminants using binding buffer in which the imid¬ 
azole concentration was raised to 25 mM and then to 40 mM. 
The HisGST-TEV-Snspl fusion protein was eluted over a 20 
column volume imidazole gradient (15-300 mM) with a final 
step to 500 mM imidazole to strip the column of any remain¬ 
ing protein. Throughout purification, the purity of the sample 
was analyzed by Coomassie Brilliant Blue-stained SDS- 
PAGE and the percentage of the total sample comprised of 
recombinant SARS-CoV nspl was estimated by the densi¬ 
tometry feature of the Alphalmager HP gel imaging system. 

Cleavage of fusion protein 

TEV protease with an N-terminal polyhistidine affinity 
tag was added to the IMAC purified HisGST-TEV-Snspl 
sample in a 1:50 mass ratio to cleave the fusion protein and 
simultaneously the sample was dialyzed into Buffer A 
(50mM Hepes, pH 7.5, 250mM NaCl and ImM (3-mercap- 
toethanol) overnight at 4°C. The TEV protease treated 
SARS-CoV nspl was separated from the protease, the 
cleaved HisGST affinity tag and any remaining uncleaved 
HisGST-Snspl fusion protein by loading the sample onto a 
second 5mL HisTrap IMAC column equilibrated with 
Buffer A. Imidazole gradients were created using Buffer B, 
which was identical to Buffer A with the addition of 500 mM 
imidazole. Following sample loading onto the column, a step 
gradient to was applied to raise the imidazole concentration 
to 25 mM, and the column was washed to elute the SARS- 
CoV now lacking the affinity tag. A final sharp linear gradi¬ 
ent to raise the imidazole concentration to 500 mM was per¬ 
formed to elute the species possessing a polyhistidine tag 
(e.g., TEV protease and the cleaved HisGST tag). The SARS- 
CoV nspl containing fractions were pooled and concen¬ 
trated to 18.3mg/mL by an Amicon Ultra-15 Centrifugal Fil¬ 
ter Unit with a 5000 MWCO membrane (Millipore) using 
centrifugation at 2000g at 4 °C. The protein was prepared for 
size exclusion chromatography (SEC) by dialyzing overnight 
at 4°C against SEC buffer [25 mM Hepes, pH 7.5, 150mM 
NaCl, 1 mM EDTA and 5mM dithiothreitol (DTT)]. 

Size exclusion chromatography and multi-angle light 
scattering 

A Superdex 200 HL 16/60 SEC column (Amersham) was 
employed as a final polishing purification step to remove 


aggregated protein and low molecular weight contami¬ 
nants. The column was equilibrated against SEC buffer. 
Preparative SEC was run at l.OmL/min at 4°C. The result¬ 
ing fractions were analyzed by SDS-PAGE and those con¬ 
taining pure SARS-CoV nspl were pooled. Concentration 
was performed as required for additional experiments. 

A Superdex 200 HR 10/30 SEC column (Amersham) 
was used to estimate the molecular mass and oligomeric 
state of the purified SARS-CoV nspl. This column was 
equilibrated with SEC buffer and run at 0.5 mL/min at 4 °C. 
A calibration curve for molecular size estimation was gen¬ 
erated by individually loading blue dextran 2000, bovine 
serum albumin (BSA), chymotrypsinogen A, and aprotinin 
onto this analytical SEC column and eluting under similar 
conditions. These data were input into Unicorn v.5.0.1 
(Amersham) to calculate a retention volume vs. molecular 
weight calibration curve. 

Size exclusion chromatography coupled with multi-angle 
light scattering (SEC-MALS) experiments employed the 
same analytical SEC column installed on an AKTA Purifier 
modified to include differential refractive index and multi¬ 
angle light scattering (MALS) detectors (Optilab DSP 
(Wyatt) and miniDAWN (Wyatt), respectively) downstream 
of the Purifier’s standard UV flow cell. The system was exten¬ 
sively equilibrated with SEC buffer at 0.5 mL/min at 4°C. 
Purified SARS-CoV nspl (200 ul @ 3mg/mL) was injected 
onto the column and eluted at 0.5 mL/min at 4 °C. ASTRA 
software (Wyatt) was used to evaluate the MALS data. 

Circular dichroism 

SARS-CoV nspl was concentrated to 5mg/mL and dia¬ 
lyzed against lOmM phosphate buffer, pH 7.5, composed of 
0.26 g monosodium phosphate monohydrate and 2.2 g diso¬ 
dium phosphate heptahydrate per 1L in preparation for 
circular dichroism (CD) analysis on a Jasco Spectropolar- 
imeter Model J-715. SARS-CoV nspl dilutions of 1:3, 1:4, 
1:5, 1:10, 1:50, 1:100, and 1:150 were measured in one of 
three reference cells (1.0 cm, 1.0 mm and 0.1 mm) at 20 °C to 
determine optimal conditions. Standard Analysis (Jasco) 
program files were extracted and further analyzed using the 
k2d web server (www.embl-heidelberg.de/~andrade/k2d/) 
to estimate secondary structure composition [23]. Mean res¬ 
idue ellipticity ([6] expressed in deg x cm 2 /dmol) was calcu¬ 
lated using [6] = 6 x 100 x MJ(c x lx V A ), where 0 is the 
experimental ellipticity in mdeg, M r is the protein’s molecu¬ 
lar weight in Daltons, c is protein concentration in mg/mL, 
/ is the cuvette path length in cm and N A is the number of 
residues in the protein. 

Secondary structure was predicted based on the recom- 
binantly expressed SARS-CoV nspl amino acid sequence 
(post-affinity tag cleavage) using SCRATCH [24], PSI- 
PREP [25], PROFsec [26], Sable-2 [27], and Predator [28]. A 
consensus secondary structure prediction was made based 
upon these individual prediction results and used to com¬ 
pare to the secondary structure content measured experi¬ 
mentally by CD. 
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Results 

Construction of SARS-CoVnspl bacterial expression 
plasmid 

An E. coli vector (Topo-HisGST-YZW, Yanzhou Wang, 
unpublished) was employed to construct an expression vec¬ 
tor to produce full-length SARS-CoV nspl. The backbone 
of the plasmid is based on pET-15b, retaining the advanta¬ 
ges of this pET vector (i.e., the powerful but stringent Tllac 
promoter, ampicillin resistance) while introducing a dual 
N-terminal affinity tag (polyhistidine and GST), a highly 
specific TEV-protease cleavage site, and topoisomerase 
ligation (Fig. lb). All modifications to the pET-15b plasmid 
are between the unique BamHl and the Ncol sites. 

The SARS-CoV nspl cDNA fragment was produced by 
PCR, using a template that was generated by RT-PCR 
from the 5' end of the viral RNA genome. A STOP codon 
was introduced during the PCR step, as the template 
cDNA did not contain a STOP codon immediately 3' to the 
nspl coding sequence because the wild-type SARS-CoV 
nspl is proteolytically processed from the N-terminal end 
of the large ppla and pplab polyproteins (486 and 
790 kDa, respectively). The PCR amplified cDNA was 543- 
nt long, plus 3' adenine overhangs due to the use of Taq 
polymerase in the PCR. The overhangs are required for 
topoisomerase TA cloning. The completeness of the pHis- 
GST-TEV-Snspl expression vector was confirmed by DNA 
sequencing. 

Expression and purification of the fusion protein 

The optimal expression condition for the SARS-CoV 
nspl fusion protein in 5mL cultures was determined to 
employ the Rosetta(DE3) E. coli strain in TB media con¬ 
taining ampicillin and chloramphenicol, with growth and 
expression occurring at 37 °C, and harvesting occurring 3 h 
post-induction. Sufficient soluble SARS-CoV nspl 
expressed as a His-GST fusion protein was present for the 
desired structural and biophysical studies that additional 
expression optimization was not required. 

Expression was easily scaled up to 1 L cultures grown in 
2.8 L Fernbach flasks. The initial step of fusion protein 
purification was IMAC employing a nickel-charged col¬ 


umn. The thawed and resuspended pellets were lysed using 
a Microfluidizer. The Microfluidizer not only efficiently 
lyses the cells in a single run, but the resulting supernatant’s 
viscosity was lower than that obtained by other methods 
(e.g., sonication) allowing for easier sample loading onto 
the IMAC column. The SARS-CoV nspl fusion protein 
was eluted as a single but somewhat broad peak by a linear 
gradient of increasing imidazole concentration. Fractions 
were pooled based upon protein purity, as judged by Coo- 
massie Brilliant Blue stained SDS-PAGE (Fig. 2a). The 
purity of the pooled fusion protein, post-IMAC, was 80% 
(Table 1), with a single major band running at a molecular 
weight of ~50kDa, as expected for the fusion protein 
(SARS-CoV nspl at 19.6kDa + HisGST-TEV fusion tag at 
28.1 kDa). Two major contaminants running at ~30kDa 


a 123456789 10 



k 12 3 4 

66.3 kDa 


21.5 kDa Snspl 



Fig. 2. Coomassie stained SDS-PAGE analysis of SARS-CoV nspl at 
various stages of purification, (a) Gel displaying fractions from the initial 
IMAC purification (IMAC #1) and the second IMAC following cleavage 
of the dual affinity tag (IMAC #2). Lane 1, Mark 12 molecular weight 
marker (Invitrogen); lane 2, total cell lysate; lane 3, lysate supernatant; 
lane 4, IMAC #1 flow-through; lane 5, IMAC #1 wash; lane 6, IMAC #1 
HisGST-Snspl elution; lanes 7-9, IMAC #2 flow-through fractions con¬ 
taining SARS-CoV nspl; lane 10, IMAC #2 elution of cleaved dual affin¬ 
ity tag. (b) Gel analysis of preparative SEC. Lane 1, Mark 12 molecular 
weight marker (Invitrogen); lane 2, sample loaded onto SEC column; 
lanes 3-4, pooled SARS-CoV nspl eluted from SEC loaded onto the gel at 
3 and 6 pg, respectively. 


Table 1 


Yield of recombinant SARS-CoV nspl purified from E. coli 


Step 

Total protein (mg) a 

Purity (%) c 

His-GST-nspl (mg) 

SARS-CoV nspl (mg) 

Lysate (soluble) 13 

1080 

28 d 

300 

— 

IMAC #1 

75 

OO 

O 

Qu 

60 

— 

Tag cleavage + IMAC #2 

30 

80 e 

0 

24 

SEC 

21 

99 e 

0 

21 


a Estimated by Bradford assay; fraction containing SARS-CoV nspl. 
b From 6 L culture. 

c Estimated from densitometry on Coomassie-stained SDS-PAGE gels. 
d Purity of the His-GST-nspl fusion protein. 
e Purity of SARS-CoV nspl post-affinity tag cleavage. 
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Table 2 


Molecular weight estimation by SEC 


Sample 

Type a 

Mol. weight (kDa) b 

Retention vol. (mL) 

Est. mol. weight (kDa) 

Blue Dextran 2000 

S 

-2000 

7.33 

— 

BSA (dimer) 

S 

132.6 

11.44 

— 

BSA (monomer) 

S 

66.3 

13.12 

— 

Chymotrypsinogen A 

S 

25.0 

15.79 

— 

Aprotinin 

S 

6.5 

17.70 

— 

SARS-CoV nspl 

U 

20.2 

14.45 

37.2 

Superdex 200 HR 10/30 

on AKTA purifier 

calibration: retention vol. 

(mL) = A x log(MW) + 5; MW in 

kDa A = —4.291, 5 = 21.19, 


correlation = —0.9928. 
a Type: standard (S) or unknown (U). 
b MW calculated from amino acid sequence. 


and at ^65 kDa plus several minor contaminants were also 
observed. 

TEVprotease cleavage of fusion protein 

The HisGST dual affinity fusion tag was cleaved from 
SARS-CoV nspl using TEV protease possessing its own 
(His) 6 affinity tag. This protease retains a useful level of 
activity over a wide range of buffer conditions and tempera¬ 
ture. Thus, it is possible to perform the TEV protease cleav¬ 
age in conjunction with a dialysis step to remove the 
imidazole, rather than performing cleavage and dialysis 
separately. Cleavage was nearly complete following incuba¬ 
tion overnight at 4 °C. The cleaved fusion tag, TEV prote¬ 
ase, and any remaining uncleaved fusion protein was 
separated from the now tagless SARS-CoV nspl using a 
second IMAC column. Several of the minor contaminants, 
presumably E. coli proteins that co-eluted with the fusion 
protein on the initial IMAC run, were resolved from the 
cleaved SARS-CoV nspl during this step. A single major 
contaminant running at ~65 kDa on SDS-PAGE 
remained. 

Preparative size exclusion chromatography 

Preparative scale SEC was used as the final purification 
step. Prior to SEC, the SARS-nspl sample was concen¬ 
trated to minimize the volume applied to the SEC column 
in order to enhance resolution. The concentrated sample 
was stable in the SEC buffer and could be stored at 4 °C for 
several days with no observed precipitation or degradation. 
The SEC elution profile included a small early eluting peak 
corresponding to the high molecular weight contaminant, 
and a single large well-formed peak corresponding to 
SARS-CoV nspl. Coomassie Brilliant Blue stained SDS- 
PAGE analysis (Fig. 2b) indicated that the SEC purified 
SARS-CoV nspl was 99% pure, and the major contami¬ 
nant at ~65kDa was removed. The protocol yielded 21 mg 
of purified protein (3.5 mg per 1 L culture). 

Estimating molecular weight and oligomerization state 

The molecular weight and oligomeric state of the SEC 
purified SARS-nspl in its native, soluble state was esti¬ 


mated by two methods: standard analytical SEC with a cal¬ 
ibration curve derived from well-behaved protein standards 
and SEC-MALS. The same Superdex 200 HR 10/30 
column and the same SEC buffer was used in both tech¬ 
niques, and the protein eluted as a single well-formed peak 
in all runs. For the standard SEC size estimation, the puri¬ 
fied SARS-CoV nspl reproducibly eluted at 14.45mL, cor¬ 
responding to a molecular weight estimate of 37.2 kDa 
(Table 2). The calculated molecular weight of the recombi- 
nantly expressed SARS-CoV nspl is 20.2kDa, including six 
vector-derived N-terminal amino acid residues (GSLDAL) 
remaining post-cleavage. Thus, the molecular weight of 
37.2 kDa estimated by SEC implies that the SARS-CoV 
nspl is present as a dimer (37.2/20.2kDa = 1.84) in solution. 
The preparative scale SEC column was also calibrated 
using protein standards, and the molecular weight esti¬ 
mated from the results of this larger SEC column con¬ 
firmed the analytical SEC results (data not shown). The 
molecular weight estimated by SEC-MALS was 19.6 kDa. 
However, the protein eluted from the column at a similar 
elution volume as in the standard SEC run. These data 
implies that the SARS-CoV nspl are present as a monomer 
(19.6/20.2 kDa = 0.97) in solution, in contrast to the stan¬ 
dard SEC results. The discrepancy in these results will be 
discussed below. 

Circular dichroism 

The purified SARS-CoV nspl was subjected to CD anal¬ 
ysis to experimentally determine the protein’s secondary 
structure composition. A 0.1mm path length cell and a 
minimal phosphate buffer were used to minimize buffer 
effects upon the measured spectrum. The secondary struc¬ 
ture composition estimated from the CD spectrum was 28% 
a-helix, 33% (3-strand, and 39% random coil. By compari¬ 
son, the consensus predicted secondary structure composi¬ 
tion based upon the amino acid sequence alone was 26% 
oe-helix, 24% (3-strands, and 50% random coil. See Fig. 3. 

Discussion 

SARS-CoV nspl has been successfully produced in a 
recombinant E. coli expression system, meeting the goal of 
producing milligram quantities of highly purified protein 
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CD Spectra, 0.1 mm path length 



Fig. 3. CD spectra of SARS-CoV nspl at l.Omg/mL using a 0.1 mm path 
length cell. 


for structural and biophysical study. The purified sample 
was stable in solution at concentrations of 1 mg/mL to 
^18 mg/mL, was present as a single well-defined oligomeric 
state, and possessed a significant amount of secondary 
structure. 

The use of an expression vector encoding a dual affinity 
tag (polyhistidine and GST) allows for the possibility of 
purification by two orthogonal affinity methods. It is not 
unusual for a small number of native E. coli proteins to co¬ 
elute with a recombinantly expressed fusion protein con¬ 
taining a single affinity tag purified on the appropriate 
affinity resin; whereas, it is unusual for a native E. coli pro¬ 
tein to effectively bind to both IMAC and glutathione res¬ 
ins. We have successfully used this dual tag/dual affinity 
column procedure to highly purify a number of proteins, 
and in at least one instance (mouse HoxA5 homeodomain) 
the presence of the dual affinity tag dramatically increased 
expression compared to the comparable fusion protein pos¬ 
sessing only a polyhistidine tag (Umland, unpublished 
data). The Topo-HisGST-pET15bTEV was chosen for 
expression of recombinant SARS-CoY nspl for these rea¬ 
sons. Upon development of the purification protocol, it was 
found that sufficiently high purity was obtained by IMAC, 
followed by removal of high molecular weight aggregates 
by SEC. However, the presence of the GST portion of the 
affinity tag provides options for future purifications, if 
required. 

Both the fusion protein and SARS-CoV nspl following 
removal of the affinity tag were stable in solution under the 
conditions described for purification and characterization. 
The pH was maintained near neutrality, but ionic strength 
was varied significantly during the experiments, ranging 


from only lOmM phosphate buffer up to 250 mM NaCl. 
The protein remained in solution in monomeric form fol¬ 
lowing storage at 4°C for one week. For long term storage, 
the protein was flash frozen in small aliquots using liquid 
nitrogen, and then stored at — 80 °C. The preparation of a 
stable protein sample was an important goal, and is 
required prior to placing significant efforts into structural 
and biophysical studies. The protein was also resistant to 
proteolytic degradation. While no explicit proteolytic diges¬ 
tion assays were performed on the sample, there was no 
indication that native E. coli proteases caused any observ¬ 
able degradations either pre- or post-lysis. The lack of pro¬ 
teolytic degradation is important for easily obtaining a 
homogeneous sample. It is an indication that the protein 
maintains a globular fold, hindering proteolysis. 

Circular dichroism was used to determine the secondary 
structure composition of the purified SARS-CoV nspl 
(Fig. 3). The experimentally derived composition displayed 
reasonable agreement with the consensus prediction based 
on amino acid sequence alone. The major deviation 
between experiment and prediction was the experimental 
data indicated a higher than expected amount of (3-strand, 
resulting in a less than expected amount of random coil. 
Having more residues in a regular secondary structure con¬ 
formation likely aids the stabilization of the protein, and is 
an indication of a well folded protein. However, it should 
be noted that the program k2d, used to analyze the CD 
data, considers random coil to include all residues that do 
not participate in an oc-helix or a (3-strand, and this term 
does not imply that such residues lack a defined and stable 
structure within a given protein. The experimentally deter¬ 
mined composition of 28% oe-helix, 33% (3-strand, and 39% 
random coil is consistent with values observed for other 
proteins having a globular fold. For example, using the 
same CD protocol, we have determined the secondary 
structure composition of another SARS-CoV protein 
(nsp9) as being 10% oe-helix, 39% (3-strand, and 51% ran¬ 
dom coil (unpublished results). These values compare 
extremely well with the values (13% oc-helix, 35% (3-strand, 
and 52% random coil) calculated from its crystal structure 
(PDB: 1UW7). Hen egg white lysozyme (PDB: 1HEW) and 
bovine trypsin (PDB: 1GBT) contain approximately 50% 
and 54%, respectively, of their residues in other than oe-heli- 
cal or (3-strand conformations, based upon their crystal 
structures. 

Molecular weight estimation by SEC calibrated against 
the elution volumes of several well-behaved protein stan¬ 
dards is a well-established procedure. SEC is also capable 
of providing an estimation of the oligomeric state of the 
protein in solution under the chosen buffer conditions. This 
method provided an estimated molecular weight for the 
purified recombinant SARS-CoV nspl of ~37kDa and 
indicated that it was present predominantly as a single spe¬ 
cies. These data can be interpreted as the protein being 
present as a dimer in solution, as the calculated mass of a 
monomer is 20.2 kDa. However, molecular weight estima¬ 
tion by traditional SEC is limited by the assumptions that 
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the protein sample interacts with the column resin in an 
ideal fashion (e.g., no electrostatic or hydrophobic interac¬ 
tions), and the individual protein particles (monomers or 
complexes) are approximately spherical, as the elution pro¬ 
file is influenced not only by molecular weight but also by 
molecular shape. 

SEC-MALS employs a combination of light scattering 
and refractive index detectors to continuously monitor 
the SEC eluant, providing molecular weight estimates 
unaffected by a sample’s non-ideal interaction with the 
SEC resin. The sole role of SEC in a SEC-MALS experi¬ 
ment is to maximize the homogeneity of the sample being 
analyzed by MALS at any given instant, as MALS pro¬ 
vides a weighted average of the molecular weight of all 
species in the aliquot under analysis. Molecular weight 
estimation by MALS is largely independent of molecular 
shape, and so the SEC-MALS results are influenced sub¬ 
stantially less by non-ideal sample behavior then when 
using SEC alone. Analysis of 14 protein standards showed 
that the SEC-MALS method can routinely estimate 
molecular weights of native proteins within 5% [29]. SEC- 
MALS indicates that purified SARS-CoV nspl is present 
as a monodisperse monomeric population weighing 
19.6kDa in solution. The disagreement between the SEC 
and the SEC-MALS molecular weight estimations may 
be due to non-ideal interactions between the protein and 
the SEC media. However, it is likely an indication that the 
protein’s shape deviates significantly from spherical (e.g., 
oblate or prolate). SARS-CoV nspl is likely present as a 
monomer in solution as molecular weights estimated by 
SEC-MALS are more accurate than those estimated from 
SEC alone. 
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